allCode.jl
. It's been tested under Julia 1.9
.For more accurate benchmarks, each variable is wrapped in
ref(x) = (Ref(x))[]
and interpolated through $
. Action | Keyboard Shortcut |
---|---|
Open All Codes and Outputs in a Post | Ctrl + π |
Close All Codes and Outputs in a Post | Ctrl + π |
Close Any Popped Up Window (like this one) | Esc |
Previous Post | Ctrl + π |
Next Post | Ctrl + π |
List of Sections in a Post | Ctrl + x |
List of Keyboard Shortcuts | Ctrl + z |
allCode.jl
. It's been tested under Julia 1.9
.ref(x) = (Ref(x))[]
and interpolated through $
. Readability and performance are two crucial features in coding. However, they're not always compatible, leading you to prioritize one over the other. In this post, we'll explore a tool that enables you to enhance performance without compromising readability: lazy broadcasting.
In programming, the term lazy refers to postponing the execution of some code until the result is required. The opposite is eager, which means executing the code as soon as the computer reads it. When it comes to broadcasting, Julia performs it eagerly by default. This post will show how lazy broadcasting can enhance performance.
Lazy broadcasting is useful for computing an output that requires several intermediate steps, as long as intermediate results are unimportant and we don't need to keep them. This improves performance by i) avoiding memory allocations for intermediate results, [note] Strictly speaking, we're referring to heap allocations. The concept involves several technical aspects that are not necessary for our purposes. See here if you want to know why this matter for performance, and here for an intuitive explanation of heap and stack allocations. and ii) combining operations into a single one, which enables Julia to perform further optimizations.
To facilitate the explanation, let's consider a concrete example. This is based on Economics, but don't worry if you're not an economist, as it doesn't require any specialized knowledge.
The goal is to compute a company's profit for a given price of its product. Profit is defined as the difference between revenue and cost. In turn, revenue is the product of price and quantity, whereas cost is the cost per unit times the quantity produced. Therefore, if our aim is to compute profits for some given prices, we need to compute quantities, revenues, and costs as intermediate steps.
Formally, given some price, we need to compute the quantity demanded: q(p):=10β2.1Γp
which makes it possible to compute the revenue and costs: r(q,p):=qΓp c(q):=1.1Γq
providing us with the profit Ο(r,c):=rβc
Our task at hand is to compute (4), with (1), (2), and (3) as intermediate results we won't eventually need.
The following packages are required for the code in this post. Note in particular the use of LazyArrays
, which is what allows us to use lazy broadcasting.
using DataFrames, Distributions, Random, BenchmarkTools
using LazyArrays
The code below generates mock data for prices. It additionally creates functions in Julia representing equations (1), (2), (3), and (4).
Random.seed!(123) # setting the seed for reproducibility
price_vector = 2 .* rand(1_000)
quantity(price) = 10 - 2.1 * price
revenue(quantity, price) = quantity * price
cost(quantity) = 1.1 * quantity
profit(revenue, cost) = revenue - cost
We now compare the performance of two broadcasting approaches. The first one is the default approach, which performs broadcasting eagerly. The second one performs the broadcast lazily, which we indicate by adding the macro @~
. [note] The benchmarks in this post wrap each variable into the function ref(x) = (Ref(x))[]
. This prevents the computer from realizing that we're repeating the same calculation. Otherwise, @btime
would directly repeat stating the result after a few iterations rather than computing it, thus distorting the comparison.
function compute_profit1(price)
compute_quantity = quantity.(price)
compute_revenue = revenue.(compute_quantity, price)
compute_cost = cost.(compute_quantity)
profit.(compute_revenue, compute_cost)
end
@btime compute_profit1(ref($price_vector));
1.070 ΞΌs (4 allocations: 31.75 KiB)
function compute_profit2(price)
compute_quantity = @~ quantity.(price)
compute_revenue = @~ revenue.(compute_quantity, price)
compute_cost = @~ cost.(compute_quantity)
profit.(compute_revenue, compute_cost)
end
@btime compute_profit2(ref($price_vector));
233.333 ns (1 allocation: 7.94 KiB)
As 1 ΞΌs is 1,000 ns, the lazy approach is faster. This occurs because the default broadcasting performs eager computations. Thus, compute_profit1
stores the results for quantities, revenue, and cost, and then computes the profit with those results. The procedure ends up performing 4 allocations: three for the intermediate steps, and one for the output of interest (i.e., profit).
In contrast, compute_profit2
indicates Julia that quantities, revenue, and cost should be calculated lazily. It's like saying to Julia: "do not compute anything at this point, let me only describe the operation involved". This allows Julia to avoid the allocations for the intermediate results, fuse all the operations into a single one, and provide the profit as the outcome. This explains why we've reduced 4 allocations to 1, where the only allocation is storing the vector of profits and hence is unavoidable.
x = [2,3,4]
eager(x) = sum(x .* x .* log.(x))
@btime eager(ref($x))
30.955 ns (1 allocation: 80 bytes)
x = [2,3,4]
lazy(x) = sum(@~(x .* x .* log.(x)))
@btime lazy(ref($x))
29.819 ns (0 allocations: 0 bytes)
A lazy approach is only preferred when the intermediate results aren't needed for any other computation. If, for instance, quantities are necessary to computer other results besides profits, storing them will likely improve performanceβthe detrimental effect of storing the result would be more than compensated for by avoiding the computation of quantities multiple times.
Furthermore, lazy broadcasting should only be applied to performance-critical parts of the code, or when a function is repeatedly used across your projects. Otherwise, the time spent on writing the code might not be worth the performance gain.
Lazy broadcasting could also be useful if you have a tight loop or, in the same vein, for computations on subgroups in a DataFrame. To illustrate this, let's consider a hypothetical dataset containing information on companies, spanning several years and countries. In particular, let's create some mock data for years 2000-2020 and 50 countries, each with 10,000 companies. This results in a DataFrame with 10,500,000 observations. If you're interested in seeing how the data was constructed, you can open the code below.
#80-20 Pareto Rule within country-year
prices_random(nr_firms, nr_countries) = [rand(Pareto(log(4, 5)), nr_firms) for _ in 1:nr_countries]
function generate_yearly_data(year, nr_firms, nr_countries)
revenue = prices_random(nr_firms, nr_countries)
dff = DataFrame(country=1:nr_countries, price = revenue)
dff = flatten(dff,2)
dff.firm = repeat(1:nr_firms, outer = nr_countries)
dff.year .= year
return dff
end
function generate_data(nr_years, nr_firms, nr_countries)
dff = generate_yearly_data(2000, nr_firms, nr_countries)
for year in 2001:nr_years-1+2001
append!(dff, generate_yearly_data(year, nr_firms, nr_countries))
end
return select!(dff, [:year, :country, :firm, :price])
end
dff = generate_data(nr_years, nr_firms, nr_countries) #it generates the DataFrame
Ultimately, what matters is that the DataFrame looks like the following.
5Γ4 DataFrame
Row β year country firm price
β Int64 Int64 Int64 Float64
ββββββΌββββββββββββββββββββββββββββββββ
1 β 2000 1 1 2.75317
2 β 2000 1 2 4.54952
3 β 2000 1 3 2.38347
4 β 2000 1 4 1.35309
5 β 2000 1 5 1.72773
Suppose we want to compute the mean profit for each country and year. This requires performing the same calculation nr_countriesΓnr_years=1,050 times. In contrast to the previous example, each firm's profit is not of interest anymore, and so we can avoid storing this result too. The following comparison shows an eager and a lazy broadcast approach for computing the mean profit.
dfg = groupby(dff,[:country,:year])
function average_profit1(price)
compute_quantity = quantity.(price)
compute_revenue = revenue.(compute_quantity, price)
compute_cost = cost.(compute_quantity)
compute_profit = profit.(compute_revenue, compute_cost)
mean(compute_profit)
end
@btime transform(ref(dfg), ref(:price) => average_profit1 => :profit)
233.661 ms (9203 allocations: 1.10 GiB)
dfg = groupby(dff,[:country,:year])
function average_profit2(price)
compute_quantity = @~ quantity.(price)
compute_revenue = @~ revenue.(compute_quantity, price)
compute_cost = @~ cost.(compute_quantity)
compute_profit = @~ profit.(compute_revenue, compute_cost)
mean(compute_profit)
end
@btime transform(ref(dfg), ref(:price) => average_profit2 => :profit)
164.001 ms (794 allocations: 801.15 MiB)